Search CORE

229 research outputs found

Embeddings for the lexicon: modelling and representation

Author: Chiarcos Christian
Declerck Thierry
Ionov Maxim
Publication venue
Publication date: 24/04/2023
Field of study

Towards the Integration of Sign Languages Data in the Linguistic Linked Open Data Cloud

Author: Declerck Thierry
Publication venue
Publication date: 01/01/2022
Field of study

Purpose: In the field of electronic lexicography, there is an increasing interest in offering ways to represent and interlink lexical data originating from different modalities. This topic is particularly discussed within initiatives and projects concerned with the representation of lexical information in a Linked Data (LD) compliant format, so that they can be published within the Linguistic Linked Open Data (LLOD) cloud. In this context, we can observe that Sign Language (SL) lexical data are not currently represented in the datasets included in the LLOD cloud. Looking at the “Overview of Data-sets for the Sign Languages of Europe”, published by the “Easier” European project,3 we also do not see any mention of a dataset being available in an LD-compliant format. We therefore investigate ways of representing SL data in the LLOD cloud and linking them to other types of language data already available in an LD-compliant format

Mykolas Romeris University Institutional Repository

Natural Language Dialogue Service for Appointment Scheduling Agents

Author: Busemann Stephan
Declerck Thierry
Diagne Abdel Kader
Dini Luca
Klein Judith
Schmeier Sven
Publication venue
Publication date: 01/01/1997
Field of study

Appointment scheduling is a problem faced daily by many individuals and organizations. Cooperating agent systems have been developed to partially automate this task. In order to extend the circle of participants as far as possible we advocate the use of natural language transmitted by e-mail. We describe COSMA, a fully implemented German language server for existing appointment scheduling agent systems. COSMA can cope with multiple dialogues in parallel, and accounts for differences in dialogue behaviour between human and machine agents. NL coverage of the sublanguage is achieved through both corpus-based grammar development and the use of message extraction techniques.Comment: 8 or 9 pages, LaTeX; uses aclap.sty, epsf.te

arXiv.org e-Print Archive

Extraction and Processing of Rich Semantics from Medical Texts

Author: Declerck Thierry
Denecke Kerstin
Deng Yihan
Publication venue
Publication date: 01/01/2016
Field of study

Berner Fachhochschule: ARBOR

Acción COST “Red europea para la ciencia de datos lingüísticos centrada en la web” (NexusLinguarum)

Author: Declerck Thierry
Gracia Jorge
McCrae John P.
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/09/2020
Field of study

We present the current state of the large “European network for Web-centred linguistic data science”. In its first phase, the network has put in place several working groups to deal with specific topics. The network also already implemented a first round of Short Term Scientific Missions (STSM).Presentamos el estado actual de la “Red Europea para la ciencia de datos lingüísticos centrada en la Web”. En su primera fase, el proyecto ha establecido varios grupos de trabajo para tratar temas específicos. La red también implementó una primera ronda de Misiones Científicas de Corto Plazo (la sigla STSM en Ingles, para Short Term Scientifc Mission).Work presented here was supported in part by the COST Action CA18209 – NexusLinguarum “European network for Web-centred linguistic data science”, the project Prêt-à-LLOD, under grant agreement no. 825182, and the ELEXIS project, under grant agreement no. 731015

Repositorio Institucional de la Universidad de Alicante

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Ontologies for a Global Language Infrastructure

Author: Buitelaar Paul
Declerck Thierry
Hayashi Yoshihiko
Monachini Monica
Publication venue: Vassar, USA
Publication date
Field of study

Given a situation where human language technologies have been maturing considerably and a rapidly growing range of language data resources being now available, together with natural language processing (NLP) tools/systems, a strong need for a global language infrastructure (GLI) is becoming more and more evident, if one wants to ensure re-usability of the resources. A GLI is essentially an open and web-based software platform on which tailored language services can be efficiently composed, disseminated and consumed. An infrastructure of this sort is also expected to facilitate further development of language data resources and NLP functionalities. The aims of this paper are twofold: (1) to discuss necessity of ontologies for a GLI, and (2) to draw a high-level configuration of the ontologies, which are integrated into a comprehensive language service ontology. To these ends, this paper first explores dimensions of GLI, and then draws a triangular view of a language service, from which necessary ontologies are derived. This paper also examines relevant ongoing international standardization efforts such as LAF, MAF, SynAF, DCR and LMF, and discusses how these frameworks are incorporated into our comprehensive language service ontology. The paper concludes in stressing the need for an international collaboration on the development of a standardized language service ontology

PUblication MAnagement

Considerations about Uniqueness and Unalterability for the Encoding of Biographical Data in Ontologies

Author: Declerck Thierry
Sprugnoli Rachele
Publication venue
Publication date: 01/01/2018
Field of study

This paper results from observations that have been made while studying ontological and linked data-based approaches to the encoding of biographical data. Based on certain issues we discovered and which will be described here, we aim to call for a collaborative work towards guidelines for modelling biographical data in the standard Semantic Web representation languages. The need for guidelines became even more clear after reading an article, which described various types of errors in biographical data encoding that have been generated due to an unsuitable use of the owl:sameAs property when referring to the linked data-based description of the life of two literary authors. In this context, there is also a need to agree on the core element of which a biographical description constitutes. More specifically, we aim to determine the “biographical unit”, which should be primarily modelled and to which all related information should be linked by using corresponding semantic properties. Apart from that, we will also discuss the need of the definition and use of synchronic versus diachronic properties associated with the modelled biographical unit. Regarding this point, we come to the conclusion that for the description of a biographical unit, there are probably no properties whose values remain unaltered over time. This is particularly true if the provenance information, that can provide contrasting values which, however, might be correct from different point of views, is taken into account

Archivio istituzionale della Ricerca - Università degli Studi di Parma

PubliCatt

Archivio della ricerca - Fondazione Bruno Kessler

Ontology Lexicalisation: The lemon Perspective

Author: Buitelaar Paul
Cimiano Philipp
Declerck Thierry
McCrae J.
Montiel-Ponsoda Elena
Publication venue: Facultad de Informática (UPM)
Publication date: 10/11/2011
Field of study

Ontologies (Guarino1998) capture knowledge but fail to capture the structure and use of terms in expressing and referring to this knowledge in natural language. The structure and use of terms is the concern of terminology as well as lexicology. In recent years, the relevance of terminology in knowledge representation has been recognized again (for example the advent of SKOS1) but less consideration has been given to lexical and linguistic issues in knowledge representation (Buitelaar2010)

Archivo Digital UPM

The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions

Author: Declerck Thierry
Herrero Zazo María
Martínez Fernández Paloma
Segura-Bedmar Isabel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The management of drug-drug interactions (DDIs) is a critical issue resulting from the overwhelming amount of information available on them. Natural Language Processing (NLP) techniques can provide an interesting way to reduce the time spent by healthcare professionals on reviewing biomedical literature. However, NLP techniques rely mostly on the availability of the annotated corpora. While there are several annotated corpora with biological entities and their relationships, there is a lack of corpora annotated with pharmacological substances and DDIs. Moreover, other works in this field have focused in pharmacokinetic (PK) DDIs only, but not in pharmacodynamic (PD) DDIs. To address this problem, we have created a manually annotated corpus consisting of 792 texts selected from the DrugBank database and other 233 Medline abstracts. This fined-grained corpus has been annotated with a total of 18,502 pharmacological substances and 5028 DDIs, including both PK as well as PD interactions. The quality and consistency of the annotation process has been ensured through the creation of annotation guidelines and has been evaluated by the measurement of the inter-annotator agreement between two annotators. The agreement was almost perfect (Kappa up to 0.96 and generally over 0.80), except for the DDIs in the MedLine database (0.55-0.72). The DDI corpus has been used in the SemEvaI 2013 DDIExtraction challenge as a gold standard for the evaluation of information extraction techniques applied to the recognition of pharmacological substances and the detection of DDIs from biomedical texts. DDIExtraction 2013 has attracted wide attention with a total of 14 teams from 7 different countries. For the task of recognition and classification of pharmacological names, the best system achieved an F1 of 71.5%, while, for the detection and classification of DDIs, the best result was F1 of 65.1%.Funding: This work was supported by the EU project TrendMiner [FP7-ICT287863], by the project MULTIMEDICA [TIN2010- 20644-C03-01], and by the Research Network MA2VICMR [S2009/TIC-1542].Publicad

Elsevier - Publisher Connector

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Proof of Concept of Ontology-based Query Expansion on Financial Domain

Author: Declerck Thierry
Martínez Fernández José Luis
Martínez Paloma
Moreno Schneider Julián
Publication venue: Sociedad Española Para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2013
Field of study

Este trabajo presenta el uso de una ontología en el dominio financiero para la expansión de consultas con el fin de mejorar los resultados de un sistema de recuperación de información (RI) financiera. Este sistema está compuesto por una ontología y un índice de Lucene que permite recuperación de conceptos identificados mediante procesamiento de lenguaje natural. Se ha llevado a cabo una evaluación con un conjunto limitado de consultas y los resultados indican que la ambigüedad sigue siendo un problema al expandir la consulta. En ocasiones, la elección de las entidades adecuadas a la hora de expandir las consultas (filtrando por sector, empresa, etc.) permite resolver esa ambigüedad.This paper explains the application of ontologies in financial domains to a query expansion process. The final goal is to improve financial information retrieval effectiveness. The system is composed of an ontology and a Lucene index that stores and retrieves natural language concepts. An initial evaluation with a limited number of queries has been performed. Obtained results show that ambiguity remains a problem when expanding a query. The filtering of entities in the expansion process by selecting only companies or references to markets helps in the reduction of ambiguity.Este trabajo ha sido parcialmente financiado por el proyecto Trendminer (EU FP7-ICT287863) , el proyecto Monnet (EU FP7-ICT 247176) y MA2VICMR (S2009/TIC-1542).Publicad

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo